explorer: heatmap SQL pre-aggregation + adaptive radius (#233)#241
Merged
Conversation
b631b4d to
fb85ff0
Compare
) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fb85ff0 to
4a74b8f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to PR #240 (heatmap phase 1). Two related changes:
LIMIT 100000raw-row scan + JS per-pixel binning with a DuckDBGROUP BYthat does the binning server-side. Removes the cap honestly: every sample in the bbox is counted, regardless of total sample count.Why #1: LIMIT 100000 was geographically biased
LIMIT 100000returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR — the largest source by row count). The "(capped)" status warning from #240 disclosed the problem but didn't fix it.This PR pushes the binning into DuckDB. SQL computes
(x_bin, y_bin)pixel coordinates server-side usingFLOOR/LEAST/GREATEST, thenGROUP BY (x_bin, y_bin)returning one row per non-empty pixel withCOUNT(*)as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. NoLIMITneeded — every sample counted into its true pixel bucket.Antimeridian handled: when bbox wraps (
west > east), SQL shiftslongitude < westby +360 so pixel arithmetic works in a continuous coordinate space.Verified counts vs the existing
samples tablesummary line (= true sample count for the current view):Render time at world view (~6M samples → 35k cells): ~7s on localhost — similar to or faster than the
LIMIT 100kversion. Status text always reports the true count; the(capped)branch is removed.Why #2: adaptive radius
After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything to full red.
Two complementary fixes:
maxOpacity: 0.6on the heatmap.js instance — caps the rendered alpha so dense areas don't fully wash out the satellite imagerysqrt(canvas_pixels / cell_count) × 2, clamped to[6, 30]. World view (35k cells) → radius ≈ 6 (tight pixel dots, no overlap saturation). Cyprus medium (~400 cells) → radius = 30 cap (smooth blobs as before).World view now shows geographic structure instead of solid red. Tight zooms unchanged visually.
Test plan
tests/playwright/heatmap-overlay.spec.js5/5 pass on localhostOut of scope
HEATMAP_LIMITconstant (= 100,000) is kept in the code but no longer referenced; left in place for phase 2 in case a safety cap on cell count is reintroduced.Provenance
Authored by Claude in response to RY feedback ("wondering whether we can do better geographic random sampling"). Approach (SQL pre-aggregation by pixel cell) chosen over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Adaptive radius added in response to RY's second feedback that world view was washing out to red.
Cross-refs